Processing Compressed Texts: A Tractability Border
نویسنده
چکیده
What kind of operations can we perform effectively (without full unpacking) with compressed texts? In this paper we consider three fundamental problems: (1) check the equality of two compressed texts, (2) check whether one compressed text is a substring of another compressed text, and (3) compute the number of different symbols (Hamming distance) between two compressed texts of the same length. We present an algorithm that solves the first problem in O(n) time and the second problem in O(nm) time. Here n is the size of compressed representation (we consider representations by straight-line programs) of the text and m is the size of compressed representation of the pattern. Next, we prove that the third problem is actually #P-complete. Thus, we indicate a pair of similar problems (equivalence checking, Hamming distance computation) that have radically different complexity on compressed texts. Our algorithmic technique used for problems (1) and (2) helps for computing minimal periods and covers of compressed texts.
منابع مشابه
On the Computational Complexity of Embedding of Compressed Texts
In this work we consider a well-known problem of processing of compressed texts. We study the following question (called Embedding): whether one compressed text is a subsequence of another compressed text? In this paper we show that Embedding is NPand co-NP-hard.
متن کاملProcessing Text Files as Is: Pattern Matching over Compressed Texts, Multi-byte Character Texts, and Semi-structured Texts
Techniques in processing text files “as is” are presented, in which given text files are processed without modification. The compressed pattern matching problem, first defined by Amir and Benson (1992), is a good example of the “as-is” principle. Another example is string matching over multi-byte character texts, which is a significant problem common to oriental languages such as Japanese, Kore...
متن کاملExploring the tractability border in epistemic tasks
We analyse the computational complexity of comparing informational structures. Intuitively, we study the complexity of deciding queries such as the following: Is Alice’s epistemic information strictly coarser than Bob’s? Do Alice and Bob have the same knowledge about each other’s knowledge? Is it possible to manipulate Alice in a way that she will have the same beliefs as Bob? The results show ...
متن کاملDeblocking Joint Photographic Experts Group Compressed Images via Self-learning Sparse Representation
JPEG is one of the most widely used image compression method, but it causes annoying blocking artifacts at low bit-rates. Sparse representation is an efficient technique which can solve many inverse problems in image processing applications such as denoising and deblocking. In this paper, a post-processing method is proposed for reducing JPEG blocking effects via sparse representation. In this ...
متن کاملSolving Classical String Problems on Compressed Texts
Here we study the complexity of string problems as a function of the size of a program that generates input. We consider straight-line programs (SLP), since all algorithms on SLP-generated strings could be applied to processing LZ-compressed texts. The main result is a new algorithm for pattern matching when both a text T and a pattern P are presented by SLPs (so-called fully compressed pattern...
متن کامل